我们介绍了声学场景和事件的检测和分类的任务描述(DCASE)2022挑战任务2:“用于应用域通用技术的机器状况监控的无监督异常的声音检测(ASD)”。域转移是ASD系统应用的关键问题。由于域移位可以改变数据的声学特征,因此在源域中训练的模型对目标域的性能较差。在DCASE 2021挑战任务2中,我们组织了一个ASD任务来处理域移动。在此任务中,假定已知域移位的发生。但是,实际上,可能不会给出每个样本的域,并且域移位可能会隐含。在2022年的任务2中,我们专注于域泛化技术,这些技术检测异常,而不论域移动如何。具体而言,每个样品的域未在测试数据中给出,所有域仅允许一个阈值。我们将添加挑战结果和挑战提交截止日期后提交的分析。
translated by 谷歌翻译
本文旨在开发一种基于声学信号的无监督异常检测方法来自动机器监测。现有的方法,例如Deep AutoCoder(DAE),变异自动编码器(VAE),条件变异自动编码器(CVAE)等在潜在空间中的表示功能有限,因此,异常检测性能差。必须为每种不同类型的机器培训不同的模型,以准确执行异常检测任务。为了解决此问题,我们提出了一种新方法,称为层次条件变化自动编码器(HCVAE)。该方法利用有关工业设施的可用分类学等级知识来完善潜在空间表示。这些知识也有助于模型改善异常检测性能。我们通过使用适当的条件证明了单个HCVAE模型对不同类型机器的概括能力。此外,为了显示拟议方法的实用性,(i)我们在不同领域评估了HCVAE模型,(ii)我们检查了部分分层知识的影响。我们的结果表明,HCVAE方法验证了这两个点,并且在AUC得分度量上最大的15%在异常检测任务上的基线系统的表现优于基线系统。
translated by 谷歌翻译
A method to perform offline and online speaker diarization for an unlimited number of speakers is described in this paper. End-to-end neural diarization (EEND) has achieved overlap-aware speaker diarization by formulating it as a multi-label classification problem. It has also been extended for a flexible number of speakers by introducing speaker-wise attractors. However, the output number of speakers of attractor-based EEND is empirically capped; it cannot deal with cases where the number of speakers appearing during inference is higher than that during training because its speaker counting is trained in a fully supervised manner. Our method, EEND-GLA, solves this problem by introducing unsupervised clustering into attractor-based EEND. In the method, the input audio is first divided into short blocks, then attractor-based diarization is performed for each block, and finally, the results of each block are clustered on the basis of the similarity between locally-calculated attractors. While the number of output speakers is limited within each block, the total number of speakers estimated for the entire input can be higher than the limitation. To use EEND-GLA in an online manner, our method also extends the speaker-tracing buffer, which was originally proposed to enable online inference of conventional EEND. We introduce a block-wise buffer update to make the speaker-tracing buffer compatible with EEND-GLA. Finally, to improve online diarization, our method improves the buffer update method and revisits the variable chunk-size training of EEND. The experimental results demonstrate that EEND-GLA can perform speaker diarization of an unseen number of speakers in both offline and online inferences.
translated by 谷歌翻译
拟声术语是语音上模仿声音的字符序列,在表达声音的特征,诸如持续时间,间距和Timbre的特征是有效的。我们提出了一种使用拟声缺陷的环境 - 辐射方法,以指定要提取的目标声音。利用这种方法,我们通过使用U-Net架构来估计来自输入混合谱图和拟声型的时频掩模,然后通过掩蔽频谱图来提取相应的目标声音。实验结果表明,该方法只能提取对应于拟声病的目标声音,并且比使用声音事件类别指定目标声音的传统方法更好地执行。
translated by 谷歌翻译
Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. We then propose a new method to improve Mixup based on the novel insight. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across various datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.
translated by 谷歌翻译
The external visual inspections of rolling stock's underfloor equipment are currently being performed via human visual inspection. In this study, we attempt to partly automate visual inspection by investigating anomaly inspection algorithms that use image processing technology. As the railroad maintenance studies tend to have little anomaly data, unsupervised learning methods are usually preferred for anomaly detection; however, training cost and accuracy is still a challenge. Additionally, a researcher created anomalous images from normal images by adding noise, etc., but the anomalous targeted in this study is the rotation of piping cocks that was difficult to create using noise. Therefore, in this study, we propose a new method that uses style conversion via generative adversarial networks on three-dimensional computer graphics and imitates anomaly images to apply anomaly detection based on supervised learning. The geometry-consistent style conversion model was used to convert the image, and because of this the color and texture of the image were successfully made to imitate the real image while maintaining the anomalous shape. Using the generated anomaly images as supervised data, the anomaly detection model can be easily trained without complex adjustments and successfully detects anomalies.
translated by 谷歌翻译
In this paper, we propose a novel architecture called Composition Attention Grammars (CAGs) that recursively compose subtrees into a single vector representation with a composition function, and selectively attend to previous structural information with a self-attention mechanism. We investigate whether these components -- the composition function and the self-attention mechanism -- can both induce human-like syntactic generalization. Specifically, we train language models (LMs) with and without these two components with the model sizes carefully controlled, and evaluate their syntactic generalization performance against six test circuits on the SyntaxGym benchmark. The results demonstrated that the composition function and the self-attention mechanism both play an important role to make LMs more human-like, and closer inspection of linguistic phenomenon implied that the composition function allowed syntactic features, but not semantic features, to percolate into subtree representations.
translated by 谷歌翻译
Bayesian Inference offers principled tools to tackle many critical problems with modern neural networks such as poor calibration and generalization, and data inefficiency. However, scaling Bayesian inference to large architectures is challenging and requires restrictive approximations. Monte Carlo Dropout has been widely used as a relatively cheap way for approximate Inference and to estimate uncertainty with deep neural networks. Traditionally, the dropout mask is sampled independently from a fixed distribution. Recent works show that the dropout mask can be viewed as a latent variable, which can be inferred with variational inference. These methods face two important challenges: (a) the posterior distribution over masks can be highly multi-modal which can be difficult to approximate with standard variational inference and (b) it is not trivial to fully utilize sample-dependent information and correlation among dropout masks to improve posterior estimation. In this work, we propose GFlowOut to address these issues. GFlowOut leverages the recently proposed probabilistic framework of Generative Flow Networks (GFlowNets) to learn the posterior distribution over dropout masks. We empirically demonstrate that GFlowOut results in predictive distributions that generalize better to out-of-distribution data, and provide uncertainty estimates which lead to better performance in downstream tasks.
translated by 谷歌翻译
深度神经网络在数据流是I.I.D的规范环境中的预测和分类任务上表现良好,标记的数据很丰富,并且类标签平衡。随着分配变化的挑战,包括非平稳或不平衡数据流。解决了这一挑战的一种强大方法是在大量未标记的数据上对大型编码器进行自我监督的预处理,然后进行特定于任务的调整。鉴于一项新任务,更新这些编码器的权重是具有挑战性的,因为需要微调大量权重,因此,他们忘记了有关先前任务的信息。在目前的工作中,我们提出了一个模型体系结构来解决此问题,以一个离散的瓶颈为基础,其中包含成对的单独和可学习的(键,价值)代码。在此设置中,我们遵循编码;通过离散瓶颈处理表示形式;和解码范式,其中输入被馈送到预处理的编码器中,编码器的输出用于选择最近的键,并将相应的值馈送到解码器以求解当前任务。该模型只能在推理过程中获取和重复使用有限数量的这些(密钥,值)对,从而启用本地化和上下文依赖的模型更新。从理论上讲,我们研究了所提出的模型最小化分布的影响的能力,并表明与(键,值)配对的这种离散瓶颈降低了假设类别的复杂性。我们经验验证了提出的方法在各种基准数据集的挑战性分配转移方案下的好处,并表明所提出的模型将共同的脆弱性降低到非i.i.d。与其他各种基线相比,非平稳培训分布。
translated by 谷歌翻译
本文证明了鲁棒性意味着通过数据依赖性的概括界限进行概括。结果,鲁棒性和概括被证明是以数据依赖性方式紧密连接的。我们的界限改善了以前的两个方向的界限,以解决自2010年以来几乎没有发展的开放问题。第一个是减少对覆盖码的依赖。第二个是消除对假设空间的依赖性。我们提供了几个示例,包括套索和深度学习的例子,其中我们的界限被证明是可取的。关于现实世界数据和理论模型的实验表明,在各种情况下的近乎指数改进。为了实现这些改进,我们不需要关于未知分布的其他假设。取而代之的是,我们仅包含训练样本的可观察到的可计算特性。一个关键的技术创新是对多项式随机变量的改善浓度,它超出了鲁棒性和泛化。
translated by 谷歌翻译